Scaling up to Billions of Cells with DATASPREAD: Supporting Large Spreadsheets with Databases

نویسندگان

  • Mangesh Bendre
  • Vipul Venkataraman
  • Xinyan Zhou
  • Kevin Chen-Chuan Chang
  • Aditya Parameswaran
چکیده

Spreadsheet software is the tool of choice for ad-hoc tabular data management, manipulation, querying, and visualization with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. We develop DATASPREAD, a system that holistically unifies databases and spreadsheets with a goal to work with massive spreadsheets: DATASPREAD retains all of the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the scalability and collaboration abilities of traditional relational databases. We design DATASPREAD with a spreadsheet front-end and a regular relational database back-end. To integrate spreadsheets and databases, in this paper, we develop a storage and indexing engine for spreadsheet data. We first formalize and study the problem of representing and manipulating spreadsheet data within a relational database. We demonstrate that identifying the optimal representation is NP-HARD via a reduction from partitioning of rectangles; however, under certain reasonable assumptions, can be solved in PTIME. We develop a collection of mechanisms for representing spreadsheet data, and evaluate these representations on a workload of typical data manipulation operations. We augment our mechanisms with novel positionally-aware indexing structures that further improve performance. DATASPREAD can scale to billions of cells, returning results for common operations within seconds. Lastly, to motivate our research questions, we perform an extensive survey of spreadsheet use for ad-hoc tabular data management.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DataSpread: Unifying Databases and Spreadsheets

Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data ex...

متن کامل

Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management

Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DATASPREAD, to holistically integrate spreadsheets as a frontend interface wit...

متن کامل

A new 2D block ordering system for wavelet-based multi-resolution up-scaling

A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...

متن کامل

“It’s About the Idea Hitting the Bull’s Eye”: How Aid Effectiveness Can Catalyse the Scale-up of Health Innovations

Background Since the global economic crisis, a harsher economic climate and global commitments to address the problems of global health and poverty have led to increased donor interest to fund effective health innovations that offer value for money. Simultaneously, further aid effectiveness is being sought through encouraging governments in low- and middle-income countries (LMICs) to strengthen...

متن کامل

Scaling Up a Strengthened Youth-Friendly Service Delivery Model to Include Long-Acting Reversible Contraceptives in Ethiopia: A Mixed Methods Retrospective Assessment

Background Donor funded projects are small scale and time limited, with gains that soon dissipate when donor funds end. This paper presents findings that sought to understand successes, challenges and barriers that influence the scaling up and sustainability of a tested, strengthened youth-friendly service (YFS) delivery model providing an expanded contraceptive method choice in one locat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017